Preserve line-continuation only in dedent heredocs#3897
Preserve line-continuation only in dedent heredocs#3897
Conversation
Closes ruby#3837 While these lines are whitespace only from a runtime perspective, the line continuation is significant for AST consumers. Sort of a followup to ruby@faab217
|
I'm assuming this has no impact on generated code right? |
|
Yeah, from what I tested this gives the exact same instructions. |
| heredoc_dedent_discard_string_node(pm_parser_t *parser, pm_string_node_t *string_node) { | ||
| if (string_node->unescaped.length == 0) { | ||
| const uint8_t *cursor = parser->start + PM_LOCATION_START(&string_node->content_loc); | ||
| return pm_memchr(cursor, '\\', string_node->content_loc.length, parser->encoding_changed, parser->encoding) == NULL; |
There was a problem hiding this comment.
I know you've checked this thoroughly, but I'm just uncomfortable with this because it seems like it could include other kinds of strings, but by virtue of the dedent algorithm we're cool.
Just to be super clear, are we saying that we want to include the string node if it matches [\s\t]*\\\r?\n, but otherwise not?
If that's the case, I think I'd prefer that we explicitly check that pattern here, without relying on pm_memchr.
There was a problem hiding this comment.
it seems like it could include other kinds of strings, but by virtue of the dedent algorithm we're cool.
Hm, I wouldn't say that is the case, we only take this branch when the unescaped string is empty. That's only possible when the string indeed is dedenting whitespace only, anything else would result in it having some value.
Otherwise, your regexp is pretty much what I am checking for, with the assumption that just checking \ is good enough for the above reason. Otherwise, this is the implementation I came up with:
static inline bool
heredoc_dedent_discard_string_node(pm_parser_t *parser, pm_string_node_t *string_node) {
if (string_node->unescaped.length == 0) {
const uint8_t *start = parser->start + PM_LOCATION_START(&string_node->content_loc);
const uint8_t *end = parser->start + PM_LOCATION_END(&string_node->content_loc);
while (start < end && pm_char_is_inline_whitespace(*start)) start++;
if (start < end && *start == '\\') {
start++;
if (start < end && *start == '\r') {
start++;
}
if (start < end && *start == '\n' && start + 1 == end) {
// The string is whitespace only and ends with a line continuation.
return false;
}
}
// The string may be empty, or all whitespace was removed via dedenting.
return true;
}
// The string has content. It may be whitespace not removed via dedenting or any other characters.
return false;
}Do you prefer this still?
Closes #3837
While these lines are whitespace only from a runtime perspective, the line continuation is significant for AST consumers.
Sort of a followup to faab217